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Abstract 

When an unbiased estimator of the likelihood is used within an Markov chain Monte Carlo 
(MCMC) scheme, it is necessary to tradeoff the number of samples used against the computing 
time. Many samples for the estimator will result in a MCMC scheme which has similar properties 
to the case where the likelihood is exactly known but will be expensive. Few samples for the 
construction of the estimator will result in faster estimation but at the expense of slower mixing 
of the Markov chain. We explore the relationship between the number of samples and the 
efficiency of the resulting MCMC estimates. Under specific assumptions about the likelihood 
estimator, we are able to provide guidelines on the number of samples to select for a general 
Metropolis-Hastings proposal. We provide theory which justifies the use of these assumptions 
for a large class of models. On a number of examples, we find that the assumptions on the 
likelihood estimator are accurate. 

Keywords: Approximate Bayesian Computation; Auxiliary variables; Bayesian inference; Parti- 
cle filtering; Estimated likelihood. 



1 Introduction 



There has recently been a surge of interest in the use of unbiased estimators within Markov chain 
Monte Carlo (M CMC) schemes since their introduction in the context of B ayesian inference by 
BeaumontI (120031) . The the o retica l properties of such schemes were studied in lAndrieu fc Roberts 
(|2009l i while lAndrieu etHI (|20ld i mtroduce a general class of MCMC schemes ba sed on particle 
filter proposals for state-space models with intractable likelihoods. In particular, lAndrieu et al 



(|2O10l i introduce a specific method referred to as the particle marginal Metropolis-Hastings which 
uses a particle filter estimator of the likelihood within a Metropolis-Hastings sampler. In our article 
we shall call any method using an unbiased estimator of the likelihood within a Metropolis-Hastings 
sampler a particle MCMC (PMCMC) algorithm. 

Besides the choice of proposals inherent to any Metropolis-Hastings scheme, the main practical 
issue with PMCMC algorithms is the choice of the number, N, of samples or particles. Using 
many samples for the estimator will result in a MCMC scheme which has similar properties to the 
case where the likelihood is known but will be expensive. Using fewer samples for the estimator 
will result in faster estimation but at the expense of slowe r mixing in the r esulting Markov chain. 
Under certain assumptions about the likelihood estimator, Pitt et al. ( 20121 ) derive an optimal rule 
for choosing the number of samples which trades off these two competing concerns. However, the 
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assumption is made that the proposal for the Metropohs-Hastings scheme is the true posterior. In 
this article, under the same assumptions about the likelihood estimator, we explore the statistical 
efficiency of PMCMC estimates as measured by the Integrated Autocorrelation Time (lACT) for 
general Metropolis-Hastings proposals. This is achieved by introducing an alternative Markov chain 
whose lACT provides an upper bound on the lACT of the PMCMC scheme. This directly provides 
an upper bound on the relative inefficiency which is defined as the ratio of the lACT for the PMCMC 
scheme to the lACT for the known likelihood MCMC scheme. This quantity tells us how many 
times longer we need to run the PMCMC scheme than the MCMC scheme with a known likelihood 
in order to get the same precision for the estimation of posterior expectations. For any fixed N, we 
find that the relative inefficiency decreases as the lACT of the known likelihood scheme increases; 
that is as the Metropolis proposal becomes worse. 

Given a specified variance of an average under a PMCMC scheme at equilibrium, there is an 
associated computational time which is proportional to N times the lACT where the latter is itself 
a function of N. We would like to minimise this time but as it is generally intractable we propose 
instead to minimise an upper bound on this time which is derived from the upper bound on the 
lACT. The main finding is that N should be chosen such that the standard deviation of the log- 
likelihood estimator should be around 1, this standard deviation rising slightly when the proposal 
deteriorates. This suggests that choosing N so that this standard deviation is around 1 provides a 
conservative approach for general proposals. Additionally the computational time is relatively flat 
as a function of N within a wide interval around its minimizing argument. 

The two assumptions on the likelihood estimator central to our analysis are justified both the- 
oretically and empirically on two examples where the likelihood is known so that we can compare 
the performance, in terms of the lACT, of the PMCMC scheme to the MCMC scheme using the 
known likelihood. The first assumption is normality of the log-likelihood estimator which is the- 
oretically satisfied asymptotically in and appears to hold empirically even for moderate values 
of N. The second assumption is that the variance of the log- likelihood estimator evaluated at the 
MCMC draws of the parameter can be held constant. This can be achieved approximately in a 
principled way which will perform increasingly well as the size of the dataset increases. For both 
examples, we find that the theoretical insights with regards to inefficiency and computational time 
hold true. We believe the practical guidelines provided in this paper will be of value to practitioners 
as, independently of the proposal choice, they allow the user to select in a coherent framework. 



2 Metropolis-Hastings method using an estimated likelihood 



2.1 Auxiliary variables representation 

We briefly review how a non-negative unbiased li kelihood estimato r ma y be used within a MCMC 
scheme. This material is covered in more depth bv lBeaumoiitl ^2Q0i ) and lAndrieu &: RobertsI tOQ^h 
and by Andrieu et al. ( 20ld ) in the context of particle filterin g. Brief explanat ions of the method 
are also given in Section 2 of IPlurv fc ShephardI (mil) and in IPitt et al.l (|2ni2l l. 

Let y = ui-T = (y'l, ■■■^y'j'y be the vector of all the observations and 9 the vector of parameters. 
The likelihood of the observations is denoted p{y\0) and the prior for 6 is denoted p{0) so that 
the posterior density of interest is tt{9) oc p{y\9)p{9). We consider here scenarios where p{y\0) is 
analytically intractable but where we have access to a non-negative unbiased estimator p{y\9, u). In 
this notation, u represents all of the auxiliary random variables generated according to the density 
p{u) to construct p{y\9^u) so that 



fp{y\9,u)p{u)du = p{y\9), 



(1) 



and consider the joint density ir{9, u) of 6 and u as 



■k{9,u) =p{y\^iu)p{9)p{u)/p{y) =p{y\9,u)TT{9)p{u)/p{y\9)\ 
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in ([2]), p{y) = J p{y\0)p{9)d9 is the marginal likelihood. This joint density tt{9, u) admits the correct 
marginal density tt{9) for 9, obtained by integrating over u and using ([1]). We can now consider 
the Metropolis-Hastings method targeting ([2]). Suppose we have a joint Markov chain in {9j,Uj) 
targeting '7r{9,u). Then to move to the next step of the chain, i.e. Uj+i) we propose u* from 

p{u) and 9* from a proposal density q{9*\9j). We then take {9j^i,Uj^i) = {9*,u*) with probability, 

. r p{y\o*,u*)p{o*)q{Oj\o*) 

mm •( 

and take (^j+i, Uj+i) = {9j,Uj) otherwise. It is informative to note this Metropolis expression is 
identical to what we would obtain using the true likelihood function, except for the appearance of 
the estimated likelihood in place of the true likelihood p{y\9) in equation ([3|). In practical terms it 
should be mentioned that we do not normally retain or record the values of u. Instead we record the 
current value of the likelihood estimator p{y\9j,Uj) and note the new proposed value as p{y\9* ,u*). 

The key point is that the resulting Metropolis-Hastings chain admits 7r{9) as invariant distribu- 
tion irrespective of the variance of the estimator p{y\9, u). 




2.2 Scalar representation 

Let z = \ogp{y\9 ,u) — \ogp[y\9) = ip{9,u) be the error in the estimator of the log-likelihood and 
let g{z\9) be the density of z when u is generated from p{u) and the transformation z = ip[9,u) 
is applied, for a given value of 9. Practically we will consider Importance Sampling (IS) and 
Particle Filter (PF) estimators of the likelihood b ased on samples and fro m no w on we eniphasiz e 
notationally the dependency upon N . Following lAndrieu &: Roberts! (|2009l l and iPitt et al. I (|2012l l. 



we focus on the scalar auxiliary variable z rather than the high dimensional vector u discussed in 
Section [2] because it is possible to use our knowledge of the properties of gN{z\9) as A'' becomes 
large to make the analysis of the PMCMC chain more tractable. 

On this lower dimensional joint space, we may define the density from which we wish to sample 

as 

7rjv(^, z) = Tr{9) exp{z)gN{z\9) = 7r{9)7rN{z\9), (4) 

which is directly related to ([2]) through the many-to-one transformation from utoz. This is a proper 
joint density and we note that the conditional density 7rj\f{z\9) = exp{z)g]\f{z\9) arising from (jH) is 
also a proper density in z. 

To sample from 'kn{9, 2), we could use the Metropolis-Hastings algorithm previously described 
to sample from tt{9,u) and then set Zj = 'ip{9j;uj). However, we can alternatively simply proceeds 
as follows: given the current values {9j,Zj), we propose 9* from q{9\9j) and z* from gN{z\9*) (by 
transforming from u*) and accept the proposed pair with probability 

aQ{9j,Zj;9*,z*) =mm{l,eMz* - ZjM9*;9j)/uj{9j;9*)} , (5) 

where uj{9*;9) = TT{9*)/q{9*\9). As the proposal for 9 and the acceptance criteria are the same as 
used in the criteria ([3]), the reduced chain in our object of interest {9j} remains preserved. We refer 
to the Markov chain with acceptance probability ([5]) as the Markov chain Q. 



3 Unbiased likelihood estimators 
3.1 Importance sampling estimator 

Suppose that p{y\x;9) is the density of y conditional on the unknown parameters and the latent 
vector X with density p{x; 9). The likelihood and likelihood estimator are given by 

p{y\o) = Jp{y\x-9)p{x;9)dx, PN{y\e,u) = iv-'Ef=i^(^';^), 
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where the are independent, identicahy distributed samples from the importance density g{x\y; 6) 
and the weight function oj{x; 9) = p{y\6; x)p{x; 9)/g{x\y; 9). The vector u is the vector of canonical 
random variables used to generate the x^. The estimator p]^{y\9,u) is unbiased and so the PMCMC 
scheme given by the criterion of ([3]) may be used for any value of N . This method is explored in 
more details in Sections 15.11 and [6l If the variance of the weight function exists then a central limit, 
in A^, for the estimator is trivially available. 



3.2 Particle filter estimator 



Suppose that {xf}j>Q is a latent Markov process of initial density p{xq;9) and transition density 
p{xt-\-i\xt] 9) and we only have access to conditionally independent observations {yt}t>i c^rising from 
p(yt|xt; 6) so that the likelihood of the observations y is given by p{y\9) = f p{y\x; 9)p{x; 9)dx where 
X = XQ;T = (Xq, ...,x'j.)', 



p{x- 9)=p (xo; 9) Y[UiP{xt\xt-i;6) and p{y\x- 6) = \)^=iP{yt\xt; 0). 
In such scenarios, the variance of IS estimato rs can be too high w hen T is large. A n alter native 



(l200lh . The 



consists of using particle filters introduced bv lGordon et al.l (119931) : see iDoucet et al 
resulting likelihood estimator PNiy\9,u) is unbiase d. seelDel Morall (j2004l ). and a central limit the^ 



orem, in N for fi xed T, for the esti mator holds, see Del Morall ( 2004 ) (Proposition 9.4.1, page 301). 
Appendix A. 2 of iPitt et al.l (120121 ) provi des a detailed descriptio n likelihood estimator p]\f{y\9,u) 
for the general auxiliary particl e filter of Pitt &: Shephard ( 19991 ) with a single resampling step at 



each time point as suggested by Carpenter et al. ( 19991 ) 



3.3 Approximate Bayesian computation 

Suppose that py(y|^) is the likelihood of the unknown parameter vector 9 for observations y, and 
that we cannot evaluate the likelihood, but we can generate pseudo-observations from it from it. 
In such case s , the method of Approximate Bayesian Computation (ABC) has become popular; see 
Marin et al. ( 201ll ) for a recent survey. Let S{y) be a vector of summary statistics for y, e.g. S{y) 



could be the whole of y, or the sample mean, sample standard deviation or sample quantiles of 
y. We define the ABC likelihood of S{y) as p^«c(S(y)|0) = J K, {S{y)\ S{x)) pY{x\e)dx, where 
(<S'(y)| 'S'(a;)) is a probability density function of argument S{y) and e > is a scalar such that 

lim^ {S{y) \ S{x)) is the Dirac delta measure located at S{x). ABC inference relies on the 

modified posterior vr^^'^ {6) oc p^^'^{S{y)\9)p{9). The ABC likehhood is typically intractable but an 
unbiased estimator is given by 

pt:^^{S{y)\9,u) = N-'ELiKe (S{y)\ S{x') 



where x^,...,x^ is an iid sample generated from py{x\6) using some random variables u. The 
estimator p^^ {S {y)\9 , u) can be used within MCMC to sample from vr^^^ {9). This estimator may 
be viewed as an IS estimator as discussed previously. 



4 Main results 
4.1 Outline 

This section presents the main contributions of the paper. The objective to is to construct an 
upper bound on the inefficiency (lACT) for the PMCMC scheme Q on {z,9), of Section [2.2[ This 
in turn allows us to consider minimising, with respect to A'^, the corresponding Computational 
Time (CT) which takes into account both the lACT and the cost of obtaining an estimator of the 
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likelihood pi\i{y\0,u). In Section 14.2^ we define lACT and consider the general properties of the 
MCMC scheme, Q^'^, for the case where the likelihood p{y\9) is exactly known. We introduce a 
new reversible Markov chain Q* in Section [4.31 with the same invariant distribution as the PMCMC 
scheme Q, of Section [2.21 The acceptance probabilities are lower, for the same proposals in {z,9), 
for Q* than for Q. As a consequence, the lACT of any function of 6 is greater for the Q* chain. 

In Section [4.41 two assumptions on the distribution of the error of the log likelihood estimator, 
z, are introduced. The first assumption is a normality assumption for z. The second assumption is 
that the variance of z can be made constant by varying in the PMCMC scheme based upon 
the proposed value for 9. These assumptions are examined more closely in later sections. Under 
these assumptions, in Section 14. 4[ it is possible to obtain an upper bound for the lACT of any 
function of 9. This is an upper bound on the lACT under the Q* chain which in turn bounds the 
lACT under the PMCMC chain Q of interest. The bound has a partially tractable form in that it 
can be evaluated exactly given cj^ and the lACT of the MCMC scheme, Q^'^, considered in Section 
14.21 Our results show that under the specified assumptions, the relative inefficiency, defined as the 
ratio of the upper bound on the inefficiency to the inefficiency for the MCMC scheme Q^^, falls 
as a function of u^. This is a relevant measure as it quantifies the cost due to using the likelihood 
estimator in place of the true likelihood. Under the same assumptions, we also establish a bound 
on the average acceptance probability of the PMCMC scheme Q* in terms of o"^ and the average 
acceptance probability of the MCMC scheme, Q^'^. 

In Section [4.5[ we define the concept of Computational Time (CT), which takes into account 
both lACT and the cost of obtaining the estimator pN{y\6,u). Under the assumptions of Section 
14. 4| an upper bound on CT is constructed and this bound is then minimised with respect to A^. It 
is found that A^ should be chosen such that a should be around 1. 



4.2 Exact likelihood 

The measure of the inefficiency of a Markov chain in equilibrium is the i ntegrated autoc orrelation 
time (lACT). The standard definition of this quantity, see for instance Tierney ( 19941 ). is given 
below. 

Definition 1. Consider a stationary Markov chain {9j} with invariant density tt{9) and suppose 
that h{9) is a function of 9 with Y-j^ [h{9)] < oo. Define Hh = [^(9)] and ^h,n = Sj=i h{9j). 
Then the lACT of the Markov chain IFh is given by 



lim nV, {Jlh,n) = V, [h{9)] IFh with IFh = l + 2J2^^,ph{T), 

n — >oo 

where Ph{T) is the autocorrelation at lag r of the stationary sequence {h{9j)}. 

Intuitively the lACT, IFf^, quantifies how many times more samples are required from the 
Markov chain {9j} relative to the number of samples required using independent identically dis- 
tributed samples from tt{9), to achieve a given precision. 

Lemma 1. Consider a reversible and ergodic Markov chain {9j^ with invariant density tt{9). Sup- 
pose that Vtt [h{9)] < oo. Let ph{T) be the autocorrelation function at lag r of the sequence {h{9j)} 
and let IF^ be the associated I ACT, as in DefinitionlJl Then 

(i) The autocorrelation and I ACT are given by 

Ph{r) = j\\\^\Md\) and IFf, = l + 2Zr=iPhir) = j'.il + A)(l - Xr^E^idX), 

where Eh is the positive measure on (—1, 1) that is associated with h in the spectral decompo- 
sition of the transition operator of the chain. 
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(a) If IFh < oo, then y/n{fih^n — fJ-h) — ^ A/'(0;V^ [^(0)] IFh), where ^h,n fih CLfS- specified in 
DefinitionUl 

This lemma follows from Theorem 4 an d Corollary 6 in Haggstrom fc Rosenthal ( 200?! ) which it- 
self follows from Kipnis Sz Varadhan ( 19861 ) . In this paper, we use Lemma[l]to specify the properties 
of the standard MCMC scheme. 

We consider the Markov chain where the likelihood p{y\9) may be evaluated exactly. The 
resulting chain {9j} will be denoted as Q^-^. As for the PMCMC scheme, given by the equivalent 
Metropolis acceptance probabilities ([3]) and ([5]), the candidate value 6* is proposed from q{6*\6j). 
The Metropolis acceptance probability is given by 

Oqex {ef,e*)=mm{l,u;{9*;ej)M6j-6*)}, (6) 

where uj{6*; 9) = '7T{9*)/q{9*\6), as in ([5]). This standard MCMC scheme is reversible with invariant 
density 7r{9). So from Lemma [H assuming the chain is ergodic and V^r [h{9)] < oo, we may define 
the autocorrelation and lACT for the function h{9) as 

pT (r) = /iAl^li?,(dA) and IFJ^- = 1 + 2j:^^,pf^ (r) = /^^(l + A)(l - Xr'E,{dX), (7) 

where is the positive measure, for h, specifically associated with the transition operator of Q^^, 
the chain resulting from the known likelihood. 



4.3 Bounding chain for PMCMC 

We now consider the introduction of a new Markov chain Q*. The reason for introducing this Q* 
chain is theoretical rather than practical. The lACT of the Q* chain provides an upper bound on 
the I ACT under the PMCMC chain Q, detailed in Section [2. 2[ Prior to this we define a new chain 
for (9, z), which is a special case of Q. Specifically, let be the Markov chain {6j, Zj} when the 
proposal density for 9, in Section [2^21 is perfect, i.e. when q{9\6j) = 'n{9) in ([5]). Then we propose 
9* from 7r(0) and z* from gN{z\9*) and the acceptance probability, of ([5]), reduces to 

aQz{zj; z*) = min{l,exp(z* — Zj)} . (8) 

Whilst the chain itself explores the joint space for {9,z) with invariant density ([3]), the acceptance 
criterion clearly only involves the current and proposed values of z. The Markov chain Q* is now 
detailed in the algorithm below. 

Algorithm 1. (Construction of the Markov chain Q* ) The current value of the chain is {9j,Zj). 
The new value is generated according to steps (a) to (c) below. 

(a) The proposal 9* ~ q{9\9j) is accepted with probability aQEx{9j;9*) of 

(b) The proposal z* ~ gfq{z\9*) is accepted with probability aQz{zj\ z*) of ^3^. 

(c) Set = {9*,z*) if there is acceptance in both criteria (a) and (b). 
Otherwise, set {9j^i, zj+i) = {9j,Zj). 

The following Lemma establishes the relationship between the chains Q, Q*, Q^^ and Q^. In 
particular, there are two cases where the Q* chain becomes identical to the PMCMC chain Q. We 
shall denote the acceptance probability for Q* , of Algorithm [H as aQ*{9j,Zj;9*,z*). 

Lemma 2. The Markov chain Q* has the following properties: 
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(i) 

aQ{ej,Zj;e*,z*) > aQ*{ej,Zj;e*,z*) = aQEx{9j-e*) x aQz{zj-z*). (9) 
(a) Q* is a reversible Markov chain with invariant density Tr{6)Tr]\r{z\9) of ^ for any N . 
(Hi) For any function h{6) the lACT is higher for the Q* chain than for the Q chain. 

(iv) The criterion aQ*{0j, Zj;9* , z*) = aQ{9j, Zj;6* , z*) = Uqex {8j;8*), when zj = z* = 0, i.e., 
the likelihood is known. 

(v) The criterion aQ*{6j, Zj;6* , z*) = aQ{9j, Zj;9* , z*) = Oqz {zj;z*), when aQEx{9j;9*) = 1. A 
case where this will occur is when the proposal is perfect, i.e., q{9*\9j) = tt{9*). 

Proof. See Appendix 18. 1[ □ 

It is helpful to think of the Q* chain as a "sticky" version of the exact likelihood chain Q^^ as 
it has a lower acceptance probability for the same proposal; see expression ([9]). This Q* chain will 
form the basis for analysis of Section I4.4[ In particular, assumptions will be placed on the density 
gi\f{z\9), introduced in Section \2.2\ which will enable the lACT associated with the PMCMC chain 
Q to be bounded from above. 

4.4 Quantitative bounds on the inefficiency 

For the IS and PF likelihood estimators p]sf{y\9,u) of Sections 13.11 and 13.21 respectively, a central 
limit theorem in holds with asymptotic variance ^'^{9)/N. The delta method suggests that the 
approximations to gN{z\9) in the following assumptions will w ork well in practi ce. This is examined 



in more details in Section [5l Similar assumptions are used in lPitt et al.l (|2012l ). The following two 
assumptions are explored in more details in Section [H 

Assumption 1. Let z = \ogpN{y\9,u) — \ogp{y\9) he the error in the estimator of the log likelihood. 

1. We have 

gN{z\9) = <p{z;-j\9)/2N,-f\9)/N) and 7rN{z\9) = eMz)9N{z\9) = c/) {z;j\9)/2N,-f\9)/N) 
where (j){z;a,b'^) is a univariate normal density in z with mean a and variance 6^. 

2. For a given value of a"^ , the number of particles is set equal to N = N^2{9) = 7(^)^/0"-^. 

Under Assumption [H both gN{z\9) and ■kn{z\9) are functions of cr^ only and we write gN{z\9) 
and ■kn{z\9) as 

g{z\a'') = (l){z--a^/2,a^), T^{z\a') = ^{z-a'' 12,0^). (10) 

We emphasize that under this assumption z and 9 are independent. Let I Fj^ {a'^) , I F^* {a'^) , I FJ^^ 
and IF^[a'^) be the integrated autocorrelation times of the Q,Q*,Q^^ and Markov chains 
respectively, defined previously. Lemma [3] summarizes the properties of IF^{a^), under Assumption 
[H and gives an exact expression for IF^{a'^), which shows that IF^{a'^) does not depend on h. We 
note that IF^{a'^) can be computed very quickly by quadrature as it is a one dimensional integral. 

Lemma 3. Suppose that Assumption{l\ holds where g{z\a'^) and 7r{z\a'^) are defined in ilO\) and 
is fixed. Then, the following results hold for the chain {9j, zj} arising from that uses the perfect 
proposal q{9\9j) = vr(0), which is defined in Section \4.3\ 

(i) Letp[z;a'^) be the probability of rejection given the current value z. Then 

p{z- 0-2) = 1 - / aQz{z- z')g{z'\a'^)dz' = ^{z/a + a/2) - exp(-z)$(z/cj - a/2). (11) 
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(ii) 



IF {a)= E.(.|.) , _ = / 1 _^.^„ A {w)dw, (12) 



1 — p(z;(T2)y J \ — p*{w,a) 

where p*{w, a) = ^{w + a) — exp(— — a'^ /2)^{w). This means that IF^{a'^) is independent 
of the functional h. 



Proof. The proof is provided in iPitt et al. 



Pitt et al. 


( 


2012^. F 


Note that 


Pitt et al. 



(I2OI2I'). Part (i) is given by Lemma 3 and part (ii) by 



Pitt et al.l (I2OI2I ). without any loss of generality, added 



Lemma 4 of IPitt et aP \2Qli ). Note that 
(T^/2 to the proposed and accepted values of z so that g{z\a'^) is centred around and 7r(z|(T^) 
around o"^ in their notation differing from the forms of these densities given in (llOp . □ 



Theorem [T] provides an analytically tractable upper bound for the lACT of the PMCMC sam- 
pling scheme. 

Theorem 1. Suppose AssumptionUl holds and 

J^^{l + Xr'Eh{d\) (13) 

where Eh is the positive measure, for h, associated with the transition operator of Q^^ , see 
Then, for a given > 0, 

lF^{a^) < IF^'\a') < IFh'{a% (14) 

IF-{a') = 1(1 + IFnil + IF\a')) - 1. (15) 

Proof. See Appendix 18. 1[ □ 

Remark 2. Consider a Markov chain {6j} constructed from the original chain Q^^. A thinned 
chain is constructed as 9j = 02j- Then the I ACT of the chain {6j} is given by X^;^(l + A^)/(l — 
X'^)Eh{dX). If this is finite then 1113]) holds. This may be seen as /q^(1 + X)~^Eh{dX) < 00, and 



(dX) < 00. 



The remark indicates that under rather weak conditions the additional inequality of (jl3p in 
Theorem [T]will hold. We note that this remark represents a sufficient condition, not a necessary 
condition, for the existence of the integral of (jl3p . 



Corollary 1. The inequality of TheoremUl becomes an equality in two cases: 

1. IFl^ia^) = IF-{a') = IF--, asa^O. 

2. IFf^ia"^) = IF^ia"^) = IF^ia"^), when the proposal is perfect, i.e. q{6*\e) = -k{6*). 

Proof. The proof is immediate by inspecting (jl5p and noting that IF^(a'^) — > 1 as cr — > for 
part (1) and noting that IF^^ = 1 when q{9*\e) = 7r{9*) for part (2). □ 

We briefly examine bounds for the marginal probability of acceptance in the PMCMC scheme 
Q. Although we do not optimise the marginal acceptance probability, it provides an indication as to 
whether the asymptotic assumptions (in A^) made in Theorem [T] are reasonable in practice, because 
acceptance probabilities can be easily estimated during the PMCMC run. We use the bounding 
Markov chain Q*, together with Assumption [U to bound the marginal probability of acceptance Q. 
Let P'-^{A\a'^),P^^{A),P^{A\a'^) and P'-^*{A\a'^) be the marginal acceptance probabilities for the 
chains Q, Q^^, and Q* respectively. 
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Corollary 2. Suppose that Assumption{I\ holds. Then, 



where $(•) is the standard normal cumulative distribution function. This inequality becomes an 
equality when either a — > or P^^(A) = 1. 

Proof. See Appendix 18. 1[ □ 

We note that P^^{A) = 1 when the proposal is exact, i.e. q{9\0j) = tt{6). This wiU also occur 
when the proposal has the correct invariant distribution, that is when 7r{9)q{9'\9) = Tr{9')q{0\9'). In 
Sections 15.11 and 15.21 and in particular in Figures [2] and HI we examine how well the bound holds 
in practice. Whilst studying the marginal acceptance probability will give an indication of whether 
the asymptotic, in N, assumptions are accurate, it is the overall inefficiency (lACT) which is of 
central concern. 

It is instructive to examine the ratio of the inefficiency upper bound IF^{a'^) defined in p5|) to 
the inefficiency IFJ^^ when the likelihood is known, which we call the relative inefficiency bound, 
RIF}^{a^). From 



RIF^ ) - 2 Jf^ 7^ - 2 jFf^ + -{l + IF {a)). (16) 

Figure n plots RIF^^{a'^) against a and against I/ct^ for five values IF^"^, {1,2.5,4,20,100}. We 
consider IFf^^ > 1, so that the inefficiency of the MCMC scheme when the likelihood is known 
is at least as great as if we had independent samples from 7r(0). If IFf^^ = 1, corresponding to 
the perfect proposal 7r(0), then RIF^{a'^) = IF^{a'^) as noted already. For fixed a, and so for 
fixed IF^{a'^), we examine what happens to RIF^{a'^) as the proposal for 9 deteriorates, i.e. IFj^^ 
increases monotonically from 1. Equation ()16l) shows that RIFf^{a'^) decreases from a value of 
IF^ia"^) and, as IF^"^ — > oo, RIF^{a'^) — > \{l + IF^ia"^)) < IF'^{a'^). This implies that, under 
the assumptions of Theorem [H the loss in efficiency from using the estimated likelihood (for a fixed 
o") goes down as the original proposal deteriorates. See the bottom two panels of Figure [1] where 
RIF^{(j'^) falls (for fixed a) as IFJ^^ increases. Figure [1] also shows the limiting form of RIFJ^^a^) 
hmit as IF^^ rises (see IF^^ = 20 and IF^"^ = 100). 

Intuitively, using an estimated likelihood, rather than the true likelihood, leads to the 'stretch- 
ing' of the original Markov chain. If the Markov chain mixes slowly to begin with, due to a poor 
Metropolis-Hastings proposal, then the cost of stretching this chain is small as there is less infor- 
mation in the original 9 iterates of the Q^^ chain. On the other hand, if the chain Q^^ mixes very 
rapidly, then randomly stretching this process is inefficient. This suggests a certain robustness in 
the PMCMC approach, as for any fixed value of a, RIF^{a^) < IF'^{a'^) regardless of the proposal 
used. Similar patterns are observed in the empirical results reported in Section O and in particular 
Figures [3] and El 



4.5 Optimisation of the computing time 

We now show how to use the results of Theorem[T]to offer guidance on the choice of A^, or equivalently 
the choice of a, for general Metropolis-Hastings proposal schemes. The Computing Time (CT) for 
the PMCMC scheme Q is defined as CT^{a'^) = I Fj^ (a^) / a'^ . We define the upper bound on 
the computing time as CT^{a'^) = IF^ (cr^)/cj^. This is the quantity that we are interested in 
minimising with respect to a. 

Corollary 3. Suppose Assumption{l\ and U3\) hold. 
(^) CT;^{a^)<CT;;{a'). 
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(ii) IflF^"" = I, then CT;^\a^) is minimised at a^^ = 0.92 and RIF|^{cJ^^) = IF^a'J^t) = 4.54, 
P'{A\a^t) = 0.51. 

(Hi) If IF^^ > 1, the minimising rises with IF^^ to = 1.02 as IF^^ — > oo. 
(iv) Define the relative computing time for the inefficiency bound IF^'{a^) as 

RCT,{a)- ^^^^ _____ _ . 

Then, RIFf^{a'^) and RCT^{a^) are decreasing functions oflF^^. 

Proof. See Appendix 18. 1[ □ 

The top two panels of Figure [1] plot the relative computing time RCT^ {a'^) against a and 1/cr^ 
for various values of IF^^, where RIFf^{a'^) is defined in (jl6p . The top right panel shows that when 
IFJ^^ = 1, this is minimised with respect to a at 0.92, whereas the value of a at the minimum rises 
as IF^^ increases. Figure [1] also suggests a reduction and flattening out in the shape of RCT^{a^) 
as IFj^^ increases. 



5 Applications 

This section examines two models for which we use both PMCMC and standard MCMC. In Section 
15. H a simple Probit model is considered for which the standard importance sampling approach is 
used to obtain an estimator for the likelihood. In Section 15.21 a state space model is considered 
for which we use a particle filter method to estimate the likelihood. Both of these examples are 
deliberately chosen as the likelihoods can be evaluated in closed form allowing a direct examination 
of how well the approximations of the earlier sections perform in practice. For both models we will 
consider proposals which have varying degrees of inefficiency for the case where the likelihood is 
known. So, using the notation of Section [4.21 the quantity IFj^^ for the chain of ([6]) is artificially 
increased. This is simply done, for both models, by using a Metropolis proposal for 9 which is 
autoregressive, where the persistence is increased towards 1. By considering models for which the 
likelihood can be exactly computed we are able to run a standard MCMC routine, the chain of 
([6]), and estimate IFJ^-^, of ([7]). Therefore many of the quantities considered in Section 14.41 can 
be estimated straightforwardly. In these applications we will fix cr^(^T), the variance of the log- 
likelihood at a central parameter ordinate, by choosing N. 

There are several features of interest for both examples. Firstly, we are interested in the marginal 
acceptance probability of the PMCMC scheme Q which should, under the assumptions of Section 
14. 4[ be dictated by the bound of Corollary [2] as a function of a. This provides an insight into 
whether the assumptions are reasonable in practice. Secondly, the inefficiency of the PMCMC 
scheme IF^{a'^) is of interest quantitatively and qualitatively in terms of the shape of the function 
against a. In particular, this needs to be compared with IFf^{a'^) of Theorem[TJ Third, the relative 
inefficiency RIF'f^{a'^) = I F^ {a'^) / 1 F^^ is of importance as this measures how much we lose by 
estimating the likelihood relative to knowing the likelihood exactly. From ()16p and the discussion 
of Section 14. 4^ we expect that for fixed values of a this will fall as IFf^^ increases. Fourthly, we 
similarly expect that RCT^i^a"^) = RIF^{a'^)/a will fall as IFf^^ increases for fixed a. So we might 
expect similar behavior to Figure [TJ We expect that this function will flatten out with a minimum 
value slightly greater than a = 0.92. Finally, we are interested in robustly exploring Assumption [1] 
of Section 14.41 As the log-likelihood errors z are available, it is possible to record both the values 
in the PMCMC scheme which arise from gi^[z\9) and those values corresponding to acceptance 
which arise from TTiq{z\6). These are recorded over the entire PMCMC experiment so that 9 ~ vr(0) 
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approximately and the histograms of these proposed and accepted values will be compared to the 
densities of (fTO]) using a fixed value of (t'^{9t)- 



5.1 Example 1: Probit model 

The first example is a simple cross-sectional model for which the likelihood p{y\9) is known exactly. 
We use it to compare the properties of the PMCMC scheme Q with the corresponding MCMC 
scheme. We use a simple Bernoulli Probit model, where yt = I{xt > 0) for t = 1,...,T, with 

xt *~ ]^{9; 1). In this case we know the likelihood explicitly as Pr(yt = 1) = ^{0). We estimate the 
likelihood in a very simple way by using the estimated probabilities Pr(yt — ^) — ^ ^i^t'^ > 

0), where rrj'^'' *~ N{9;1), for k = 1,...,N and for t = 1,...,T, the number of individuals. To 
propose 9 for both the PMCMC and MCMC schemes we use an autoregressive Metropolis proposal 



q{9*\9j), see for example [Tierneyl (|1994l ). so that 9* = 9 + p{9j-9) + - p'^)/{v - 2%, where 

9 is taken as the posterior mode and is chosen as the negative inverse of the second derivative 
of the log posterior. Here tiy denotes a standard t-distributed random variable with v degrees of 
freedom. We set v = b and place a Gaussian prior on 9 centred at zero with a large variance. We 
consider a range of values {0, 0.4, 0.6, 0.9, 0.97} for p. We would expect that p = Q would deliver 
the best performance in both the MCMC and PMCMC schemes, in terms of lACT, and that the 
performance would decline as p increases towards one. The same proposal is used for the standard 
MCMC scheme and for the PMCMC. 

Prior to reporting the results, the method of choosing N for the PMCMC scheme is detailed. 
Before the full PMCMC scheme is implemented, the following steps are carried out. We choose a 
large initial value for the number of samples, Ns, and compute an estimate of 9t '■= '^■k [9) using a 
short PMCMC chain using Ns samples for the likelihood estimator. We estimate the variance of the 
log-likelihood estimator at 9t as Y{9t) = Y[logpNg{y\9T,u)]. This consistent estimator is discussed 
in Section [6] and is given by (j23p together with the discussion immediately after this expression. We 
then set N^- = NsY{9t)/ k'^ where is the target variance and proceed with the PMCMC scheme 
keeping this value constant. This is a static scheme in that the number of samples A'^^ does not 
change across PMCMC iterations, see Section [6l A dynamic scheme, where the number of samples 
is allowed to change is detailed in Section [6l 

Figure [3] displays the output for the PMCMC and MCMC analysis applied to a single simulated 
dataset y where 9 = 0.4 and T = 100. Figure [2] shows the marginal probabilities of acceptance. The 
probability of acceptance denoted by P^^{A) in the standard MCMC scheme increases from 0.9428 
{p = 0) to 0.9815 {p = 0.97). We wish to compare the probability of acceptance in the PMCMC 
scheme with the lower bound of Corollary [2j To do this, we evaluate the standard deviation (t{9t) 
of the log-likelihood estimator, for different values of N ^ at the central value 9t- The values for A^ 
and the standard deviation a{9T) are given in Table [TJ In Figure [21 the constant value P^^{A), 
the probability of acceptance for the PMCMC scheme and the lower bound implied by Corollary 
[2] are all plotted against a{9'r), for differing values of p. The values of P^^(A), see Table [H in the 
standard MCMC are high ranging from 0.94 {p = 0) to 0.98 (p = 0.97). We would expect the lower 
bound to be tight as discussed immediately following Corollary [2j This is the case for all values 
of p for the proposal and for the range of (t{9t) considered. In fact the true marginal acceptance 
probability for the PMCMC scheme is virtually indistinguishable from the bound in this example. 
Table [T] shows the results corresponding to Figure [2] for the independent proposal (p = ) and 
the slowly mixing proposal {p = 0.97). The inefficiency, shown in Table [H has been calculated 
by estimating the lACT from a very long run for both the MCMC {IFf^) and PMCMC (IF^) 
schemes. The computing time is calculated as CT^ = Nx IF^ for the PMCMC scheme. We used 
a minimum of 30, 000 draws for the PMCMC scheme but increased this depending on the lACT for 
the standard MCMC scheme that uses a known likelihood, denoted by IF^^ as in Section 14.41 The 
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bottom left panel of Figure [3] plots both the PMCMC relative inefficiencies RIF^ = IF^/IFl^, 
the PMCMC inefficiency divided by the MCMC inefficiency, against N for different values of p. 
Similarly, the top left panel, displays the relative computing time {N x RIF^) against N . 

The results show several important points and appear to confirm the insight from Section 14.41 
and Section 14. 5i We set h (9) = 9. When p = 0, so the independent proposal density q{9) is close 
from the true posterior vr(0), the minimum of the estimated RIF^ is achieved when a = 0.92 and 
at this value IF^^ = 4.33. This is in agreement with the corresponding predicted theoretical values, 
discussed towards the end of Section [4.5L when the proposal is the true posterior Tr{9), which are 
RIF^{a^) = 4.54 when a = 0.92. As p increases, the proposal becomes worse in terms of mixing and 
the estimated quantity IFJ^-'^ rises, whilst the relative inefficiency, RIF^ and the relative computing 
time RCT^ (here calculated as x RIF^) become smaller, holding iV, or equivalently a{9T), fixed 
as seen in Figure [3l The optimal value for a at which RCT^ is minimized appears to increase 
from a = 0.92 as p increases. Equivalently, the value of decreases. 

The loss, in terms of CT^, see Table [H or equivalently RCT^ in Figure [3l from choosing a 
different value (j{9t) from 0.92 (A^ = 120) relative to the optimum value is not too great. The 
theoretical results of Section see Figure [Din particular, suggest that as long as cr^ is between 
0.25 and 2.25 then the penalty is at most a doubling of CT^ compared to the optimum. Equivalently 
this suggests similar changes which result from choosing N between N /2.2b and 4 x A^, where A^ 
is the value corresponding to cj^ = 1. This is a reasonably large range. In the results for this 
example, see Table [H where A^ is approximately 100 {Nopt = 120 corresponding to (t{9t) = 0.91) 
then this suggests that RCT^ should vary by no more than double from the minimum when A^ is 
between about 44 to 400. This is confirmed in the results of Tabled) It is clear, from Figure [3] that, 
as expected, RCT^ declines, for fixed (j{9t)i as the proposal becomes worse and fiattens out as a 
function of a{9T)- The plot suggests that RCT'f^ may have a limiting form as p — )■ 1, that is, as the 
proposal deteriorates. This is the case for the theoretical upper bound RCT^{a) as IF^^ — t- oo, as 
discussed towards the end of Section 14.41 

Finally, in FigureEl the histograms of the accepted and rejected values of z for the case A^ = 120, 
(j{9t) = 0.91. The accepted and rejected values arise from 'kn{z\9) and gN{z\9) conditionally, see 
Section [2.21 but these densities are integrated with respect to 'k{9) as values are recorded for each 
9 arising from the PMCMC scheme. So the recorded accepted and proposed values arise from 
t^n{z) = jTTi\[{z\9)Tr{9)d9 and gN{z) = JgNiz\9)ir{9)d9. Superimposed upon the histograms of 
Figure [6] are the approximating densities ■K{z\a) and g{z\a) of (fTO|) where (t{9t) = 0.91. It may be 
seen that the approximating densities correspond very closely to the histograms. 



5.2 Example 2: Autoregressive model plus noise 

We consider observations generated by a first order autoregression (AR(1)) plus noise model as a 
simple example and apply the one step fully adapted particle filter (FAPF), see Pitt &: ShephardI 



( I999I ). The observations arise as yt = xt + (Je^t, whilst the state evolution is xt+i = — '?!') + 



cpxt + (TrjTjt, where et and r]t are standard normal and independent. We take (j) = 0.8, px = 0.5, 
= [1 — (f)'^), so that the marginal variance o"^ of the state xt, is 1. We simulate a single series 
of length T = 300 with cr^ = 0.5 for the experiment. The initialisation of the PMCMC scheme in 
terms of choosing A^ is similar to the initialisation for the Probit example of Section 15. 1[ In this case 
however, we obtain the estimate of the variance V{9t, Ns) by running a fixed number m = 1, M 
of independent likelihood estimators \ogpi\it;{y\9T,u^) for 9t, using Ns particles and recording the 
sample variance 

V{h,Ns) = M-'Z^^,logpNs{y\OT,un'' - {M-'E^^i^ogpN^^^^^ (17) 
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We are taking independent identically distributed samples of logpj\[^{y\9T,u) which are close to 
being Normally distributed. So it can be seen that approximately V{6t, Ns)/Y[logpN^{y\9T,u)] ~ 
x\^_i/{M — 1). As a consequence M does not need to be large (much less than 100) in order to 

estimate Y[log pNg{y\9T,u)] sufficiently accurately. To choose an effective value for the number of 
particles we would then take N^^ = Ns x V{9t,Ns)/k^, where k = 0.92. 

In this experiment we estimate the parameters 9 = (/U^, 0, c^)', holding the measurement vari- 
ance a1 fixed. The true likelihood for the data is computed by the Kalman filter as the model is 
a linear Gaussian state space model. This allows the MCMC scheme using the exact likelihood 
to be applied. As in Section 15. 1[ we consider varying so that the standard deviation of the 
log-likelihood estimator a{9T), taken at the central ordinate 9t (in this case the estimate 9 of the 
posterior mean ). The grid of values we consider is {11, 16, 22, 31, 43, 60, 83, 116, 161, 224, 312}. 
The value A^ = 60 results in cj(^t) = 0.92. 

The proposal we consider is similar to that used in Section 15. li We use a multivariate t- 
distribution centred on the mode of the log-likelihood obtained from the Kalman filter. Similarly 
the variance is computed using the matrix of second derivatives of the log-likelihood at the mode. 
We transform each of the parameters to the real line so that ^ = k{9), where both 9 and ^ are three 
dimensional vectors. We use an autoregressive proposal with the scalar autoregressive parameter 
p chosen as one of {0, 0.4, 0.6, 0.9, 0.97}. We first apply this proposal, for the four values of p, 
using the known likelihood in the Metropolis scheme and estimate the integrated autocorrelation 
time (IF) for each of the parameters 9 = {px, (t>i o'??)'- 

The acceptance probability for the PMCMC scheme against a{9T) is shown in Figure [H Tables 
[2] and [3] show the results for p = and p = 0.9. The relative inefficiencies and computing times 
against both A^ and (j{9t) are displayed in Figure [5j The results are very similar to the results 
for the Probit example and the theoretical insights suggested in Section 14.41 In particular, the 
relative inefficiency falls as the proposal deteriorates. The graph of RCT^ against (j{9t) suggests a 
fiattening out effect as p becomes larger. As for the Probit model. Section 15. H the histograms for 
the accepted and rejected values of z (for A^ = 60 when a{9T) = 0.92) are shown in Figure [6j The 
approximating densities of (jlOh . with a = 0.92, are superimposed. Again it may be seen that the 
approximating densities correspond very closely to the histograms. 

Empirically, similar results for CT^ against a {9T) were observe d for scenarios where the accep- 



tance probability P^^{A) was low; see Table 4 in lPitt et al.l (j2012l ). 



6 Large T results for Importance Sampling Estimators 

We will examine part (1) and (2) of Assumption [T] showing that, for moderately large T, both 
parts can be justified. We consider the IS estimator in Section 13.11 applied to panel data where 
each panel may be cross-sectional or a time series. In particular, we show that a very efficient 
estimator of the variance of the log-likelihood estimator is available. The observations in each 
panel are conditionally i ndependent, given the vector of latent variables x. Examples of such 



models are considered in (jGourieroux &: Monfortl . Il996l . Chapters 1 and 3) in the context of sim 



ulated maximum likelihood. For notational simplicity, we drop any dependence on covariates 
and denote the observations (which may be vectors) as yt for t = 1,...,T. In this case, it is 
assumed that in the model yt ~ p{yt\9;xt) and xt ~ p{xt\9). The error in the log-likelihood es- 
timator is z = Ylt=i {^'^SPN{yt\9,ut) — logp(y(|^)} , where, similarly to Section \3A\ PN{yt\9,ut) = 
^/^^k=i^(^ti^) ^■^d ujt{xt',9) = p{yt\9; xt)p{xt\9)/g{xt\9;yt). The x^ are independent, identically 
distributed samples from g{xt\9;yt) that we can consider as a function of the components of ut- We 
note that 

Eg {PN{yt\9,ut)/p{yt\9)} = f{ojt{xt;9)/p{yt\9)}g{xt\9;yt)dx = 1. 
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Define := N x Yg {pNiyt\O,ut)/p{yt\0)}. Then, 

a^{0)=Yg{ujt{xt;e)/p{yt\e)} = f{u;t{xt;ef /p{yt\ef}g{xt\e;yt)dx - 1. (18) 

We assume that the proposal g{xt\9;yt) is sufficiently heavy tailed to ensure (t^{9) < oo for all t. 
We note that any covariates entering into p{yt\9; xt), or p{xt\6), are captured by the terms cr^{9). 
We define eN,t ■= VN {pN{yt\9,ut)/p{yt\9) — 1) /at{9), so that eN,t is an independent series with 

^{£N,t) = 0, E(e^ J = 1 that satisfies a central limit theorem (in N), i.e., eN,t A/'(0, 1) as — )■ oo. 
Furthermo r e, we assume that Kg \^uJt{xt; 9)^} < oo for all t and 9, so that E(e^ J = 0(l/\/iV); see 



(jVon Bahii . Il965l . Theorem 1). If the distribution of 0Jt{xt;9) admits a density with respect to the 
Lebesgue measure which is differentiable at 0, \^iOf{xt] 9)^^ ,Kg \^LOt{xt; 9)^^ < oo for all t and 9, 
then the error z in the log-likelihood estimator can be expressed as 

2 = Ef.i log (PN{yi\<),n,)My,\e)) = Ef.i log (l + <7,(9)«j/Vw) (19) 




ViV 2N 3NVN 

Hence, we have 

Y{z) = TiP{9,T)/N + 0{T/N^), E{z) = -TiP{9,T)/{2N) + 0{T/N^), (21) 

where xl^{9,T) = Ylt=i^ti^)/'^- ^ consistent estimator of il){9,T) is 

i^N{9^T) = ELi^t{e)/T, where d^O) = {N-'Ek=Mx^-^(^f} {PN{yt\9,ut)y^ - 1. (22) 

This can be compared to the expression of cr^{9) of (jlSp . The expression for af{9) is a function 
of two sample means. Hence, using the delta method applied to ratios of sample means, it can be 
shown that a^{9) = a'f{9) + ct{9, N) /N + {9) /\/iV, where ct{9, N) is varying over t, 9 and N and 
is of order 1 in and Qt (9) is a zero-mean error independent N but with a variance varying with 
t and 9. Hence, for fixed 9, 

M9, T) = Y.ti^l{0)/T = T-^Y.tM{0) + ct{9, N)/N + (t (9) /Vn} (23) 
= ^1^{9,T) + 0{N-^) +OpiT-iN~^). 



We can use the estimator Y (z) = T^Pn{9,T)/N for Y {z) from (HI]) so that Y {z) = Y (z) + 
0{TN^'^) + Op{T2N~2). We consider, see Lemma [U N being 0{T), in which case Y (z) = 
Y (z) + Op{T~^). This result, together with (j20p . implies that z satisfies a central limit theorem in 
T, i.e. 

Y{z)-^/^{z - E{z)} A AA(0, 1) and Y{z)-^/^{z - E{z)} A M{0, 1), (24) 

where E(z) = -V(z)/2 = E(z) + Op{T-^). It should be noted as is 0{T) that the central limit 
operates very quickly in T upon the expression (j20p as the summation is over an increasing number 
of terms, each of which satisfy a central limit. 

Lemma U] provides two consistent estimators of the number of samples necessary to obtain an 
estimate z whose variance is approximately equal to a prespecified target variance k^. We call the 
first a static estimator because the number of samples (or particles) is fixed before the PMCMC 
scheme. We call the second a dynamic estimator because the number of samples depends on the 
proposed value of 9. The static estimator converges, to the target variance, at the rate Op(T~^/^) 
while the dynamic estimator converges at the faster rate Op{T~^). 

Lemma 4. Let z = log pj\[{y\9,u) — logp{y\9) and consider a'^ {9, T, N) := Y{z) for given values of 
9 £ Q and y := {yi, ...^ynp)- We make the following assumptions. 
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(i) For any in 6 & Q, we have 

al{e,T,N) = Tij{e,T)/N + 0{T/N^), (25) 

where 1^(9, T) is 0(1) in T uniformly in 9 £ Q. 
(a) A consistent estimator (9, T) (in N) of 11^(9, T) is available such that, uniformly in 9, 

ijN{e,T) = i:{9,T) + 0[N-^)+Op{T--2N-^2). 

(Hi) For large T, the posterior it{9) satisfies liniT'^oo IE7r(0) = 9 and limj'_j.oo x Yt^{9) = S . 

(iv) We have a consistent estimator (in T) 9t of 9. 

(v) ip{9,T) is a continuously differentiable function of9£@. 
Then, we obtain 

A. under Assumptions (i) to (v). Suppose that an initial starting value = O (T) is used and 
the number of samples N^^ is chosen statically as N^^ := TtPns{^t,T)/k'^ , where is a 
prespecified target variance. Then we have al{9,T,Ng^) = + Op(r~i/2)_ 

B. under Assumptions (i) and (ii). If Nq is chosen dynamically in the PMCMC scheme as 
Ng := rV^TVg, {9,T)/k^ where N^^ = O (T) + Op (1), e.g. as in result A, then al{9,T,Ne) = 

K^ + Op{T~^). 

Proof. See Appendix 18.11 □ 

We note that assumptions (i) and (ii) are justified for the IS estimator by (j2ip and (|23p . We 
would typically choose the constant = 0.92^. The dynamic estimator is preferable in that it 
requires no assumption on the behaviour of the posterior Tr{9). However, practically it is slightly 
more expensive as for each proposed value 9* from q{9*\9j) an IS estimator is ran to determine Ng*. 
For some models this overhead, albeit slight, will be unnecessary. For part (iv) we note that any 
consistent estimator, such as a method of moments estimator, may be employed. 

We examine both estimators for the Probit example of Section 15. 1[ Figure [7] displays the his- 
tograms of the standard deviations corresponding to the static and dynamic estimators for different 
values of T with a target of = 0.92^. We use a consistent estimator 9t for the static scheme. We 
calculate the actual static and dynamic variances Y[logpj\[g {y\9,u)] and Y[log pNg{y\9,u)] respec- 
tively under 9 ~ tt{9), the posterior for 10,000 sample of 9. This yields the histograms of Figure [Tj 
where we take T = 50, 100, 200 and 400. In both cases the variability of the standard deviation, 
around k = 0.92, drops as T increases and the superconsistency results, implied by Lemma [U of 
the dynamic approach also appear to hold. For this example, we found it unnecessary to use the 
dynamic scheme as we have already seen from Figure [6] that the two relevant densities are close to 
their theoretical counterparts as discussed in Section 15.11 These results together with the central 
limit theorem, in T, of (|24p provide strong justification for two parts of Assumption [TJ 

For the PF the decomposition of the log-likelihood error is similar to the decomposition given 
by (jl9p and (j20p for the IS estimator. In particular, for a fixed parameter 9, 

z = \ogpN{y\9,u) - \ogp{y\9) = Er=i{logp5-i('9) - logPt|t_i(6')}, 

where Pt\t-\ = p(yt|yi:t-i) ^) p^j_^(^) is the corresponding PF estimator. We may write 
EN^t = ^{v%-x^Q)IVt\t-m - mcyt{9), where a1(9) = NY[pf^^_^i9)/pt\t^,i9)] obtaining z = 
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StLi log ^1 -\- at{9)e N / . Whilst an expansion similar to ([20]) is possible the terms e^^t are 
correlated across t. This makes the problem of estimating the variance in a dynamic fashion, along 
the lines of Lemma HJ difficult. A potential solution may be to consider running M independent 
particle filters, each using A'^ particles, in parallel. This allows for the variance to be estimated 
straightforwardly. In addition, it is possible to use as the likelihood estimator the average of the 
M independent likelihood estimators. However, in this context, the issue of optimizing computing 
resources may be challenging. 



7 Conclusion 

When one wants to design a PMCMC sampler, the user has to select not only a proposal distribution, 
as in any standard Metropolis-Hastings algorithm, but also the number of samples to use in order to 
estimate the likelihood. Both have a direct impact on the performance of the resulting estimators 
of posterior expectations. We have provided an analysis that suggests that, under reasonable 
assumptions, one should select an log-likelihood estimator whose standard deviation a is around 1 
regardless of the proposal distribution. We have presented two simple schemes to approximately 
achieve this which will work increasingly well as T rises. 

The results provided here also shed light on the design of the proposal. In many scenarios, one is 
interested in designing a proposal which would have a pre-specifie d acceptance probabili t y P if the 
likelihood was known, such at the 0.234 probability advocated in [Roberts &: p'osenthall pOOlh for 



random walk proposals. This is typically achieved practically using adaptive MCMC techniques. In 
the context of PMCMC, this suggests that for a fixed a around 1, one should target an acceptance 
probability P' which is between P/2 and P because P^{A\a'^ = 1) ^ 1/2. 



Acknowledgements 

Arnaud Doucet is grateful for the feedback from participants of the Structure and Uncertainty 
conference at the University of Bristol and the Data Assimilation conference at the University of 
Oxforcfl held in September 2012. Robert Kohn was partially supported by ARC Discovery Grant 
DP 120104014. We would like to thank Eduardo Mendes for his research assistance. 



8 Appendix 
8.1 Proofs 

We provide proofs of Lemma[2l Theorem[Tl Corollary [2] and Corollary[3j In order to prove Theorem[T] 
we also introduce and state the additional Lemmas [5] and [H Lemma [5] is a known result whilst we 
provide the proof of Lemma [H 

Proof of LemmalM The proof that aQ*{6j, zj; 0* , z*) = aQEx{6j;9*) x aQ^{zj; z*), is obtained by 
noting that the joint acceptance of the proposed values z* , 9* for the Q* chain occurs if and 
only if there is acceptance in both stages (a) and (b), which are described in Algorithm [T] of 
Section To complete the proof of (i) aQ{9j, Zj;9* , z*) > aQ*{9j,Zj;9*,z*) follows because 

min{l,a6} > min{l,a} x min{l,6} for any a, 6 > 0. 

To establish the reversibility of the chain Q*, we consider an accepted pair {9*, z*), with 9* ^ 9j 
and z* 7^ Zj, where {9j, Zj) are the current values of 9 and z. The transition density in this case is 

' iPresentationl (slides! from I Pat a Assimilation conference! University of Oxford. 
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given by h{6j, Zj;6* , z*) = aQ*{9j, Zj;6* , z*)gN{z*\9*)q{6*\6j). Then, using invariant joint density 
of®, 

h{ej,Zj;e*,z*)7:N{zj\ej)7r{^j) 

= aQ^{zj;z*)gNiz*\9*)7rN{zj\ej)aQB^ (^,; r )g(r |^,)7r(%) 
= aQ^{zf,z*)gN{z*\e*)7rN{zj\ej)aQ^^ {e*;dj)q{e,\e*)7r{e*), 
as the chain Q^^ is itself reversible, 

= min{l;exp(z* - zj)} gN{z*\e*)7rN{zj\0j)aQEx {e*-9j)q{9j\9*)^{9*) 

= min{l; exp(zj — z*)} exp(2;*)5(Ar(z*|^*) exp(— Zj)7r7v(-Zj|0j)aQEx {9*;9j)q{9j\9*)'7T{9*) 

= aQ^{z*;zj)gN{zj\9j)7rr,{z*\9*)aQEx {9*;9j)q{9j\9*)7r{9*) 

= h{9*, z*;9j,Zj)TTN{z*\9*)TT{9*), 

by noting TrN{z\9) = e^gNiz\9). This establishes the reversibility of the chain Q* and that it has 
the correct invariant density (j4]), proving part (ii). 

To show Part (iii), we note the conditional probability of acceptance for Q is always greater 
than or equal to that of Q* by Part (i) of the lemma. The two chains are reversible and have the 
same invariant distribution. Peskun's theorem implies tha t the lACT for any function h{9j) is equal 
to or greater for the Q* chain than for the Q chain; s ee iPeskun (|l973l l and its generalization to 



continuous state spaces in Theorem 4 of Tierney (jl998l ). 

Parts (iv) and (v) follow from straightforward calculations. □ 

The following lemma is proved in iFinneil (Il992l ^ and is used in the proof of Theorem [TJ 



Lemma 5 (Generalized Holder inequality). Suppose that {zj} is a stationary Markov chain and 
Uj = g{zj) is a real scalar function of Zj. Then, 

IE(n}=il%-|)<]E(|yir) for all T. 

Proof of Theorem [71 For conciseness of notation we do not explicit the dependence on o"^ and the 
function h here because is assumed constant and h remains the same throughout the proof. 
We first outline the structure of the proof. The first inequality IF"^ < IF^* i n (1141) follows 



from Part (iii) of Lemma [2] and Peskun's inequality (see also Theorem 4 of iTiernevI (|l998l )). The 
rest of the proof concentrates on the inequality IF^* < (1 + IF^^ )(1 + IF"^ )/2 — 1. 

For j = l,...,r, define Xj = 1 if Zj = Zj-i and Xj = otherwise. Then {xj,j > 1} is 
conditionally a Bernoulli process that is also semi-Markov because the probability depends on the 
actual Markov chain Zj-i. Then Pr(a;j = l|zj_i) = p{zj-i;a'^), where the expression for the 
rejection probability p(z;(T^) is given in (llip and we write this concisely as p{zj-i). Let Wr = 
xi + ■ ■ ■ + Xr be the total number of rejections over the interval 1 to r. If = t, then there is no 
move and 9r = 9q. 

Due to the stretching influence of the Xj process, which is independent of 9 because the Zj are 
constructed to be independent of 9, it is possible to write the autocorrelations for a function h{9) 
generated by the chain Q* , of Algorithm [H as 

P^\t) = E,„^^_, {El=oP""ir - w) Friwr = w\zo:r-i)} , (26) 

where the expectation is with respect to the joint density for the Markov chain zo:r-i = (zq, zi, Zr-i] 
in the stationary regime, i.e. zq ~ vr, and p^^{t) is the rth autocorrelation of the Q^^ chain, see 

dZD- 
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Under Assumption [H the Markov chain whose reahzation up to time r — 1 is 2o:t-i is 
independent of the Markov chain in Q. From the form of p^^{t) given by ([7]), the expression 
becomes 

p<^*(r) = E,„_, {x:;=o {f',X-~''E{dX)) PrK = Mzo-.r^i)} (27) 
= /^iIE.o... {n;=i {Ei=oA'^^Pr(x, = x\z,.,)]] E{dX) 
= {nj=i«(^.-i; A)} E{dX) = f^,p^*{T; X)E{dX), (28) 

where 

a{z;X) = X+p{z){l-X)=p{z) + X{l-p{z)), (29) 

and 

P^*{t;X)=E,,.,^_, {n.^^«(^,-i;A)} . (30) 
Then the lACT for the process Q* is 

ifQ' = Jl_^ciFQ\x)EidX), cifQ\x) = 1 + 2Er=iP'''(^; A), (31) 

where CIF'^'{X) may be regarded as the inefficiency conditional upon a fixed A and /3'^*(t;A) is 
defined in ()30p . The main aim is to show that 

CIF^'iX) <CIF'^{X), (32) 

where we define 

C7/F^(A) = l + 2Er=i/'^(-^;A), p'P{T;X)=E^{a{z;XY}. (33) 

The superscript V is used as this may itself be thought of as a process. This process will only be 
defined when it is needed later in the proof of Lemma [H 

The inequality of ([32]) is proved in two parts. In Case I, the case of A > 0, is first considered. 
In fact the inequality ([32]) is shown by proving that p^*{t;X) < p^{t;X) where both of these 
conditional autocorrelations are strictly positive. For Case II, the case of A < 0, the proof is a little 
more involved as it is no longer the case that p^ (r; A) < p^ {t\ A). We therefore consider a process 
which has a higher rejection rate but the same invariant distribution. 

Case I: A > 0. In this case < a{z] A) < 1, where a{z; A) is defined in ([29p . Hence, 

p'^*(r; A) = E,„_, {nj=i«(^.-i; A)} < E.{a(^; A)-} = p^(A), 

by applying Proposition [3 Hence the inequality ([32p holds for A > by considering the sums of 
the autocorrelations in the expressions ([3ip and (|33p . 

Case II: A < 0. Lemma[6]below derives the inequality ([32p for a fixed value A < 0. Specifically, 
a reversible Markov chain Vs is constructed, with the same invariant distribution as Q* . The 
conditional acceptance probability, using the same proposal, is constructed to be lower for the chain 
V5 than for Q* . We then consider a limiting form for the chain Vs, described in detail in the proof 
of Lemma [6j The resulting limiting process, "P, is shown to have the lACT of ([33p . Hence the 
inequality ([32]) holds for A < 0. 

Having established ([32p for both cases, it follows that 

jpQ" ^ f\ciF^\X)E{dX) < f\CIF^{X)E{dX) = IF'^ . 
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We now derive the expression for IF^ using ([33|) and noting that 1 — a{z) = (1 — A)(l — p{z)). 



IF'^ = 2 



1 



(l-p(z))(l-A) 



1 + 



il-A 



E{dX) 



I E{dX) - 1 = 2 _l_^(dA)| 



1 



l-p(z) 



l-p{z) 



1 = -(1 + IF^>^)(1 + /F^)-1, 



where /i^^-'^ is given by ([7]) and /F^, an imphcit function of a, is given by (jl2|) . 



□ 



Lemma 6. Consider a TTy -reversible transition density fTiu'lu) of a univariate Markov chain {yj} 
with autocorrelation at lag r given by corr{yj,yj^T-) = A'"^'. In this lemma, X < is considered as 
fixed. 

The densities in z, g{z) and TTziz) = e^g{z), are given by I where a is fixed and discarded 
from the notation. We introduce a term < 5 < 1. Consider the "stretched" joint Markov chain 
Vs, with iterates {yj,Zj}, formed as: 

1. Propose y* ~ friylVj)- 

2. Propose z* ~ giz). 

3. Set yj+i = y* with probability aQz{zj; z*) = min{l, exp(z* — zj)}, of otherwise set yj+i = 
Vj- 

4- Set Zj^i = z* with probability 6 x I{yj+i = V*), otherwise set Zj+i = Zj, where /(•) is an 
indicator function taking value 1 when the argument is true and when false. 

Then: 

(i) The chain {yj,Zj} is reversible with the invariant distribution 7r(y,z) = vrp(y)7r^(2). 
(a) The lACT for the reduced chain {yj}, when 5=1, is given by 

CIFQ'iX) = 1 + 2X:r=iP^*(^; A) where p^'{t;X) = E,„^_, { J]^"^^a(z,_i; A)} , 

where the above expectation is with respect to distribution of zo:r-i i^ the stationary regime, 
a{z; A) = p{z) + A(l — p{z)) and p{z) = 1 — f aQz(z\ z*)g{z*)dz* . 

(Hi) The chain Vs with 6 < 1 has integrated autocorrelation time CIF^^{X) where CIF^ (A) < 
C/F^^(A). 



(iv) 



lim6 iOj^^CIF^'{X)E{dX) = f^^CIF^ {X)E{dX), 

where we define 

C/F^(A)=E. \\^^' 
[1 — a[z; X) 

and E{dX) satisfies the assumptions of TheoremUl 
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Proof of Lemma 0. To prove part (i) we can use a similar proof to that of Lemma [2] to show that 
the chain {yj,Zj} is reversible and has the correct invariant distribution -K^y^z) = vrp (yj+i)7r^(z). 
The transition density of the joint space will be denoted as h{yj+i, Zj+i\yj, zj). 
Case I: y^+i 7^ yj, Zj+i / zj : 

HVj+i^ Zj+i\yj, Zj)7ry{yj)TTz{zj) = 6 x min{l; exp(2;j+i - Zj)}fT{yj+i\yj)g{zj+i)7ry{yj)'Kz{zj) 

Using the reversibility of the original chain so fT{yj+i\yj)T^Y^yi) ~ ^i^d noting 

that '/r^(z) = e^g{z) then 

= Sxmm{g{zj+i)TTz{zj);Trz{zj+i)g{zj)}fT{yj\yj+i)T^Y^yj+'^^ = ^(^i, ^^jlyj+i, ^i+i)7ry (yj+i)vr^(2;j+i). 
Case II: yj+i / yj, zj+i = zj : 

hiVj + l, Zj+l\yj, Zj)TTy{yj)TTz{Zj) 

= (1-6) X min{l;exp(zj+i - Zj)}fT{yj+i\yj)g{zj+i)TTy{yj)TTz{zj) 
= (1 - 5) X mm{g{zj+i)Trz{zj);TTz{zj+i)g{zj)}fT{yj\yj+i)TTY{yj+i) 
= Hyj^Zj\yj+i,Zj+i)TTy{yj+i)Trz{zj+i). 

This completes the proof of Part (i). The proof of Part (ii) is similar to that of the material 
between equations (j26p to (j3ip . in the proof of Theorem [H except that A is now constant so that 
we do not integrate over it. We therefore omit the details. 

For part (i ii) of the proof we note Peskun's inequality which is also used in Lemma [2l see 
( Tierney . 19981 . Theorem 4, p 5). We note that the chains induced by (5 = 1 and 6 < 1 are both 
reversible and have the same invariant distribution. However, by construction, the chain Q* with 
6 = 1 has a higher conditional probability of moving to a given position than the chain Vs with 
S <1. Hence CIFQ*{X) < CIF^^{X). 

For part (iv) we require that if A < then 

■l + a(2;;A)" 

wnere ujr " [aj = i -\- ' 

T = l 



limC/F^*(A) = E„ 



1 — a{z; A) 



where CIF'^' (A) = 1 + 2 p^' (r; A). (34) 



We will examine the limit of p^^ (r; A) as (5 — )• and show that lim^io YlT=iP^^ (''"' ^) ~ '^■k{z) [{^ ~ '^(•^S -^)}~^] • 

The probability in the chain Vs of moving to a new value of z is Prjzj+i = z*} < 5. The 
expected time within a particular regime, the same value of z, is therefore greater than 1/(5, the 
expected time associated with a geometric random variable with success probability 5. Without 
any loss of generality, let Hq = min{j G N"*" s.t. Zj / zq} be the first excursion time for Zj departing 
from zq. We denote a{z;X) = a{z) for brevity and recall that A < is fixed. Then, with /(•) an 
indicator function and p'^^{t) = corr{yQ,yr}, 



(r) = E,„,Ho {l{r < Ho)a{zoy + /(r > Ho)a{zof°E,,^^^,^\,,,Ho {Wj=Ho+M^j)) } • 
where ~ vr and Hq is defined above. Hence 

00 I Ho OD I ^ \ 1 

r=l [t=1 T=Ho+l \j=Ho+l J J 

= E^[(l - a{zo)y^] + Ri{6; A) + R2{6; A), (35) 
where the above expectation is with respect to zq ~ t^{zo) and the remainder terms are 
Ri{S; A) = -E,,^Ho {a(zo)^''+V(l " «(^o))} , 



R2{S;X)=E,,^Ho 



T=Ho+l \j=Ho+l 
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We need to verify that both of these remainder terms vanish as 5 J, 0. We first consider Ri{5; A), 
noting 1 — 0(2:0) = (1 ~ ~p(-2o))- We introduce a new geometrically distributed random variable 
with parameter 6, Hq ^ Ge{6). We note that Hq is stochastically dominated by Hq, Hq >z Hq, i.e., 
P^{Hq > x) > Pr(i?g > x), in step 4 of Lemma [H So by using the moment generating function 
(mgf) of a Geometrically distributed random variate and noting that Hq is independent of all other 
variables 



Jl-A)(l-p(zo))J - " [{l-X){l-p{zo)) 

S\aizo)\^ 1 

(l-(l-5)|a(zo)|)(l-A)(l-p(2o)) [■ 



Hence, by considering positive and negative values of a{zo), the monotone convergence theorem 
yields 



lim(5~^ \Rii6;X)\ < [ 



a{zof 



p(^)=-A/{i-A) 1(1 - a(zo))(l - A)(l -p(zo)) 

p(.)=-A/(l-A) f ^(^^^2 



TT{zo)dzo 

+ L^o i(l + a(.o))(T-A)(l-p(zo))i = + 

Noting that A < in all cases, for the denominator in the integral Ii, where a > 0, it is clear that 

(1 - a(zo))(l - A)(l -p(zo)) = (1 - X)\l-pizo)f > (I-Pizo))'. 

For the denominator in the integral I2, where 0(^0) < 0, its is clear that 

(1 + a(zo))(l - A)(l -p(zo)) = {p{zo) + 1 + A(l -p(zo)))(l - A)(l -p(zo)) 

> p{zo){l - p{zo)) for A < 0. 

For the numerator in Ii and I2 note that 0(2:9) = A + p(2o)(l — A) = p{zo) + A(l — p{zo)), so 

0(20) > 0, 0(20) < p{zo), 0(20)^ < p{zof for h. (36) 
0(20) < 0, |a(2o)| = -A - p(2o)(l - A) < -A, 0(20)^ < A^ for h. (37) 

So 

/•I p(2q)2 Mz)=-x/{i-\) ^2 

lim(5 \Ri{6;X)\< -——Tr{zo)dzo+ — -7r(2o)d2o, 

Jp(z)=-x/{i-x) -p{zo)r Jp{z)=o P{zo)[l - p[zq)) 

(38) 

where is straightforward to verify that both integrals are finite given the specific form of p{zq) 
and 7r(2;o), from p!T]) and pO|) respectively, holding fixed and known. This is shown for these 
particular forms in Lemma [71 Hence lim^^o \ X)\ =0. 

We next consider the remainder term R2 of ()35p . Then, again using an independent geometric 
random variable Hq ~ Ge{5), where Hq >z Hq, 

\R2i5;X)\<E,,,Ho\Hzo)\''' E E o„|.o,Ho( fl Hzj)\]\ 

I T=Ho+l \j=Ho+l J I 



<E,„E^,. io(2o)ro ^ E^ |, I n H^j)\ 
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As a consequence of the generalised Holder inequality, of Proposition [5l we obtain, again using the 
mgf for a geometric random variable, 



R2{6; A)| < E,„,^. <! |a(zo)|^o ^ |a(zo)r-^5 \ = E,„,^,. I E \a{zo)V 



H* 



E 



i«(^o)r« 



m+i 



^ 1 - |a(2:o)| J 
Hence the monotone convergence theorem yields 



E, 



|a(2;o)| 5 



(l-(l-5)|a(zo)|)(l-|a(zo)|) 



\hu8-^ \R2{S\\)\ =E 



|a(^o)r 



+ 



^(l-|«(^o)|)2j 

p(^)=-A/(l-A) i^^^^^p 
p(2)=0 



|a(^o)r 



(l-|a(^o)|)2 



(2)=-A/(l-A) (1 - Hzq)\)' 
TT{zo)dzo = Jl + J2. 



;Tr{zo)dzo 



For the first integral, Ji, where a(zo) > 0, the numerator |a(2;o)| < p, from (|36p . and for the 
denominator 1 — |a(zo)| = (1 — A)(l — p{zo)) > (1 — p(zo)) as A < 0. For the second integral, 
J2, where a(zo) < 0, the numerator |a(zo)| < A, from (j37|) . and for the denominator 1 — |a(zo)| = 
1 +p(zo) + A(l -p(zo)) = 1 + A + p(zo)(l - A). Hence (1 - \a{zo)\)'^ > p{zof{l - and so 



lim^-^ \R2{6;X)\ < 
54.0 



p{z)=l 



p{^)=-A/{l-A) (1 -P(^o))' 



;TT{zo)dzo + 



(1 + A)yp(^)=o 



V{z)=-X/{1~X) ^ 



7r{zo)dzo. 



These integrals are finite, as shown in Lemma [7l To complete the proof of part (iv) it should be 
noted that J^^\{1 + Xy^E{dX) is finite as an assumption of Theorem [T] and so from (j35p , the 
definition of CIF^^ (A) in (I34p and the monotone convergence theorem 



lim/F^* = lim / CIF^'{X)E{dX) 



E^ 



1 — a{z; A) 



l\E{dX) = I' 



E^ 



l + a(-g;A) 
1 — a{z; A) 



E{dX) 



as required. 



□ 



Lemma 7. For fixed a > let f{w) = ^{w) — ex.p{—a'^ /2 — aw)^{w) , where is the cumulative 

distribution function of the standard normal. 

(a) Then, the following integrals are finite, 

JZo<P{w){{l - f{w)fr'dwJZc^{w){f{w){l - f{w))}-'dw, and fZc^{w){f{w)}~'dw 

(h) Let 7r(z) and p{z) he defined as in Lemma\^ Then the following integrals are finite, 

J^^,r{z){{l-p{z)fr^dz,j^^7^{z){p{z){l-p{z))}-Uz, and j^^7,{z){p{z)r^dz. 

Proof. Part (b) follows from Part (a) by using the transformation w = {z — a'^/2)/a. We now 
obtain Part (a). The following results are used in the proof and are straightforward to verify, (i) 
4>{w) exp(o"w — o"^/2) = ((){w + o"). (ii) The function f{w) is monotonically increasing in w because 
its first derivative is positive, (iii) \\m.yj^_ao f {w) = and lim^-|-oo /(u^) = 1- (iv) For w very large 
positive, ^{—w) ~ (f){w)/w with the equality becoming exact as w f cc- The last result is obtained 
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from the following inequalities derived for w > from Formula 7.1.13 in lAbramowitz &: Stegun 
£97^), 

< < . 

The proof proceeds by noting that the integrals 

/f^<AH{(l - fHfr'dw,J^^ct>{w){fiw){l - fiw))}-^dw, and /_VH{/M}"'^^ 

are finite for any fixed /3 > 0, however large. This can be shown in a straightforward way because 
/{w) is monotonically increasing and bounded by and 1. For the rest of the proof we take /3 > 
very large. In the interval (—00, —(3), 



fiw) = ^{w + a) - exp(-o-^/2 - wa)^{w) 



exp(— (7 / z — wa)- 



-a — w 



-w 



{w + a) 



a 



■w{w + a) 



Hence , 4>{w)/f{w) "'(^jH-o-) gxp(-W(7 + cr^/2). In addition, 1 — f{w) 1, so that the following 
integrals are finite, 

/r^0H{(l - f{w)f}-'dwj:^<f^iw){f{w){l - f{w))}-'dw, and f:^cp{w){f{w)}-'dw. 

Finally, we consider the interval (/3,oo). In this interval 

1 - f{w) = ^{-w - a) + exp(-o-^/2 - wa)^{w) ^ (t){w + a)/{w + a) + exp(-o-^/2 - wa) 
exp(— (7^/2) exp(— WO")) 

and f{w) > f{f3). Hence, the following integrals are finite, 

/~</.H{(l - /M)2}-idu;,/~0M{/H(l - fiw))}-'dw, and j;^^iw){fiw)}-'dw. 



□ 



Proof of Corollary IE From ([9]), P'^{A\a'^) > P^*{A\a'^) and 



. , ^ exp(z*) 
mm 1; — - I x mm 



exp(zj) 



) X g(r|^,)^(0,)drd0,5(z*|a)7r(z,»fiz*dz,- 



(exp(z* ) 
1; ; — - ) g(z*\a)7r(zj\a)dz*dzj x / min 
exp(zj) ' 

P^{A\a'^)P^''{A), 



ijj {9j ; I 



and P^{A\a^) = 2^>(-(t/\/2) by Lemma 3, Part (fi) of lPitt et al.l ^2012 ). 



)de*dBj 



□ 



Proof of Corollary For notational convenience we omit the subscript h in our notation. Part (i) 
follows from 

u 2. _IF''{a'^) _{l + IF^'^){l + IF^{a'^)) 1 _ {I + I F^"^) I {a'^) ^ (/F^^ - 1) 



CT''{a^) 



2ct2 



+ 



Pitt et al.1 (|2012l ) show that 



da 



a"- 



and 



52 //F' 



o-=(Topt=0.92 



> 0. 



2(72 



(39) 
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Rather than minimizing CT^{a ), we focus on equivalently minimizing RCT^ {a ) = CT^{a )/IF 



dRCT'^ia^) _ + d [ IF^{a^)\ (IF^^^l) 

EX Y^ 



da 



IIF^"" da 



a^ 



d'^RCT^[a'^) _ + 5^ fIF^{a^)\ {IF 



da^ 



2IF^ 



da^ 



a" 



We obtain that RCT^ (a^) / da^ > using ([39]) and assuming that IF^^ > 1. When IF^"^ = 1, 



dRCT'^ia^) d fIF^{a^) 



a^ 







da da 

at CTopt = 0.92 estabhshing Part (ii). To obtain Part (iii), we note that 



hm 



dRCT'^ia') I d (IF^{a')\ 1 



da 



2 da 



a^ 



a-^ 



which we can verify numericahy is equal to at cJopt = 1.0206. In general, for IFf^^ > 1 



dCT'^ia'^) 



da 2 
and so the optimal value cJopt satisfies 

+ d fIF^{a^] 



da 



a^ 



{IF^ 



da 



{IF^ 



(40) 



where 



d_ flF^ia"^) 
d^ 



a^ 



> 0, 



0"=0"opt 



for CJopt > 0.92 using 

d ( dCT''{a^) 



dlF^^ \ da 



0"=(Topt ^ 



1 d fIF^{a^' 

2 da 



1 



O"=0'opt 



a" 



1 d (IF^{a 

2 'da 



0=0 opt 

1 



a^ 



< 



Topt 



from the first order condition (j40p . As a consequence, the optimal value for a increases with IF^ 
which verifies Part (iii). Finally, to obtain Part (iv), it is straightforward to see that 



EX 
h 



1 



J^EX 



2/FE 



2IF^ 



IF^{a^) + l^IF^{a^)-l 



2 ' 2/FEx ' 

so that RIF^{a^) and RCT'^{a'^) = RIF^{a^)/a^ are decreasing functions of IF"^^, holding a 



constant. 

Proof of Lemma \^ Assumption (ii) yields 



□ 



N-, 



= Ti^MsiOr, T)/k' = Ti;{eT, T)/k' + O (T/Ns) + OpiT^N, 



24 



As Ns is chosen proportional to T then consequently, N^^ is of order T. Using assumption (i), the 
achieved variance at any ordinate 9 with A'^^^ particles is 

4(0,r,iV^^) = Ti;{e,T)/N^^ + 0{T/Nl) = ^^^(0, r)/V'(^T, T) + Op(r-i). (41) 

Using Assumptions (iii) to (v), we obtain from a Taylor expansion of ip{9, T) around 9t in (I4ip that 

al{e,T,N^^) = K^ + Op{T~^. 
This establishes part A. For part B, we may write 

Ne = T^Ng^ {6, T)/^ = Tip{9, T)/^ + Op (1) , 
since N^^ is of order T. Hence, the achieved variance at a given 9 and T is given by 
alio, T, Ne) = ^9, T)/Ng + 0(r/Ar|) = + Op{T~'). 

□ 
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Figure 1: Theoretical results RCT^ (top) and RIF^ (bottom) against 1/cr^ (left) and a (right). 
Different values of IF^^ are shown on each plot. 
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p = 


p = 0.97 


Exact (MCMC) 


IF = 


1.103, Pr(Acc) = 0.94 


IF = 48.11, Pr(^ 


-c) = 0.98 








PMCMC 










N 


aiO; N) 


IF 


Pic{Acc) 


CT 


IF 


Ft{Acc) 


CT 


23 


2.27 


128.0 


0.12 


2944.0 


656.92 


0.12 


15109 


32 


1.81 


62.55 


0.19 


2001.7 


363.44 


0.20 


11630 


44 


1.54 


26.03 


0.27 


1145.3 


169.95 


0.27 


7477.9 


62 


1.26 


10.88 


0.35 


674.75 


133.72 


0.36 


8290.4 


86 


1.06 


6.581 


0.43 


565.65 


100.03 


0.44 


8602.4 


120 


0.91 


4.331 


0.49 


520.11 


97.360 


0.51 


11683 


167 


0.77 


3.174 


0.56 


529.60 


72.966 


0.58 


12185 


232 


0.65 


2.478 


0.62 


571.97 


69.931 


0.63 


16224 


323 


0.55 


2.042 


0.67 


658.83 


65.834 


0.69 


21264 


449 


0.47 


1.719 


0.71 


769.82 


59.691 


0.73 


26801 


625 


0.39 


1.602 


0.75 


1000.1 


56.664 


0.77 


35415 


869 


0.33 


1.521 


0.78 


1321.2 


53.233 


0.80 


46259 


1209 


0.28 


1.345 


0.81 


1616.5 


51.571 


0.83 


62350 



Table 1: Prohit example with a fixed dataset and T = 100. The table shows the results for the 
'Exact' MCMC sampling scheme and for the PMCMC sampling schemes using two values of p 
for the proposal, p = (left) and p = 0.97 (right). For the PMCMC sampling scheme the table 
reports for differing values of N, the probability of acceptance Pt(Acc) and the computing time 
CT = N X IF. See Section [5J\ and the corresponding plots of Figures [2| and El 



Exact 


IF{(t>) 


IF{p) 


IF{a,) 




Pr{Acc) 




2.5845 


2.5040 


2.4163 




0.7678 


PMCMC 








N 


a 






IF{p) 


IF{a^) 


CT{<j)) 


CT{p) 


CT{a^) 


Pr{Acc) 


11 


2 


2886 


136.32 


132.41 


128.66 


1499.5 


1456.5 


1415.3 


0.11424 


16 


1 


8692 


61.403 


63.756 


66.609 


982.45 


1020.1 


1065.7 


0.19036 


22 


1 


6063 


37.256 


40.486 


37.367 


819.63 


890.68 


822.07 


0.25549 


31 


1 


3412 


15.880 


18.099 


19.135 


492.29 


561.08 


593.20 


0.32622 


43 


1 


1096 


11.320 


9.7400 


10.710 


486.75 


418.82 


460.54 


0.39347 


60 





9197 


7.5040 


8.0428 


7.6168 


450.24 


482.57 


457.01 


0.45933 


83 





8058 


5.7253 


5.5841 


5.9348 


475.20 


463.48 


492.59 


0.50885 


116 





6828 


4.3756 


4.7106 


4.1693 


507.57 


546.43 


483.63 


0.56621 


161 





5828 


3.8112 


4.2379 


3.6388 


613.61 


682.30 


585.84 


0.60160 


224 





4838 


3.2711 


3.1605 


3.3134 


732.73 


707.94 


742.19 


0.63562 


312 





4096 


3.0774 


3.4768 


2.8355 


960.14 


1084.8 


884.67 


0.65793 



Table 2: ARl plus noise example with proposal parameter p = 0,T = 300, (p = 0.8, p = 0.5, = 1 
and cj^ = 0.5. InefRciency (IF) and computing time CT = N x IF shown for 9 = {(p,p,axy. See 
Figures [H and [5l 
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Exact 






IF{a.,) 




rTyJ\CC ) 




25.59 


22.21 


24.44 




U.o 1 1 1 1 


PMCMC 








N 


aie;N) 


IF{(t)) 


IF{fi) 


IF{a.^) 


CT{<P) 


CT{fi) 


CT{a^) 


Pr{Acc) 


11 


2.2886 


594.64 


488.30 


639.04 


6541.1 


5371.3 


7029.5 


0.12579 


16 


1.8692 


156.18 


157.65 


182.07 


2498.9 


2522.4 


2913.1 


0.20410 


22 


1.6063 


112.01 


91.447 


109.51 


2464.2 


2011.8 


2409.2 


0.27279 


31 


1.3412 


76.989 


71.276 


72.055 


2386.7 


2209.6 


2233.7 


0.35385 


43 


1.1096 


53.297 


58.278 


55.612 


2291.8 


2506.0 


2391.3 


0.42577 


60 


0.9197 


49.351 


47.476 


45.194 


2961.1 


2848.6 


2711.6 


0.49610 


83 


0.8058 


37.709 


29.550 


38.266 


3129.8 


2452.7 


3176.1 


0.55764 


116 


0.6828 


29.360 


36.943 


34.892 


3405.8 


4285.4 


4047.4 


0.61174 


161 


0.5828 


28.277 


27.883 


29.864 


4552.6 


4489.2 


4808.1 


0.65704 


224 


0.4838 


27.77 


29.471 


30.533 


6220.5 


6601.5 


6839.4 


0.69674 


312 


0.4096 


29.231 


25.549 


29.967 


9120.2 


7971.4 


9349.8 


0.73057 



Table 3: ARl plus noise example with proposal parameter p = 0.9, T = 300, (j) = 0.8, /.i = 0.5, = 
1, o"^ = 0.5. The inefhciency (IF) and computing time CT = N x IF shown for 6 = {(j), n, (Jx)' ■ See 
Figures [H and O 




Figure 2: Probit example T = 100, 9 = 0.5. Probabilities of acceptance displayed against (t{0), the 
standard deviation of the log-likelihood estimator. The estimated probabilities for the exact MCMC 
scheme is shown (constant ) together with the estimated probabilities from the MCMC scheme. The 
lower bound is given as the probability from the exact scheme times 2^{—a/V2). The proposal 
autocorrelation p = 0, 0.4,0.6,0.9,0.97. Differing (j'^{6) are calculated by varying the number of 
samples N. See TaWed 
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Figure 3: Prohit results for fixed dataset with T = 100. BOTTOM: Relative inefficiency RIF^ 
plotted against N (left) and against a{6) (right). TOP: Relative computing time RCT^ plotted 
against N (left) and against a{9) (right). Various value of p, for the AR(1) Metropolis proposal 
are considered. Differing (t'^{9) are calculated by varying the number of samples N. See TaWe[TJ 




Figure 4: ARl plus noise example with T = 300,0 = 0.8, /x = 0.5, = l,a^ = 0.5 . Proba- 
bilities of acceptance displayed against a{6). The estimated (constant) probabilities for the exact 
MCMC scheme are shown together with the estimated probabilities from the PMCMC scheme. The 
lower bound is given as the probability from the exact scheme times 2$(— (t/\/2). The proposal 
autocorrelations are p = 0, 0.4,0.6,0.9. See TabJes [2| and [21 
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Figure 5: ARl plus noise example with T = 300, (/> = 0.8, fi = 0.5, = 1, = 0.5. FromLEFT 
to RIGHT: RCT^ against N, RCT^ against a(e), RIF^ against N, RIFj^ against a(e). The 
three plots on all graphs have h corresponding to each of the parameters {<j), iJL,ax)' ■ From TOP 
to BOTTOM: p = 0, 0.4, 0.6 and 0.9. Here a(d) is the standard deviation of the log-likelihood 
estimator evaluated at the posterior mean. RIF^^ is the relative inefEciency and RCT^ is the 
relative computing time RIF^ x N. See Tables [21 and [3l 
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Figure 6: Histograms for the accepted and proposed vaules of Z. TOP: Probit example with N=120. 
BOTTOM: ARl plus noise example with N=60. So cr'^{9) = 0.92^. Z is recorded as the error in 
the log-likelihood for each MCMC draw of 6. The theoretical Gaussian densities for the proposal 
N{-a'^(9)/2;a'^(9)) and the accepted values N{a^(0)/2;a'^(9)) are overlaid. 
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Figure 7: Probit model of Section I5.il Histograms for the standard deviation of the log-hkehhood 
(over the posterior draws of 6) when N is chosen staticaUy (blue ) for the proposal as and when 
it is chosen dynamically (pink) as Ng. The target for the standard deviation is 0.92 and we vary 
the number of observations T. 
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